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ABSTRACT 

The equating of scores on alternate fo*rms of 
different achievement tests through the use of the three-parameter 
latent trait model, item-rfegponse theory (IRT) equating, was compared 
with the results of score equatirigs based on conventional linear and 
curvilinear equating models. Ten equatings were completed for pairs* 
of alternate forms of the Advanced Placement Program, which measures 
different content ar^as and traits in each subject area 6 . It was found 
that despite tl^ apparent violation of the, unidimen'sionality 
assumption, th£%quating results obtained ^through the IRT equating^ 
mode^L wefce found to be in agreement with those of the conventional 
equating models. By demonstrating that the IRT equating results 
parallel those of the simpler, less costly; 0 conventional methods, it 
has been shown, that it i w s still possible, to equate scores on 
non-parallel tests under Conditions which ; make, conventional equating 
inapplicable. (Author/PN) * 
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Purpose of the Study 



•The equating of scores on non-parallel forms of a test through the 

. > \ * 

application of the three-parameter latent trait model (LorcL, 1980), h^re- 

» 

after referred to as item-response theory (IRT) equating, lias beei; shown 
• i> 

'(Marco, Petersen and Stewart^ 1980; Petersen, ;€ook and Stocking, 1981) 'to 
yield at r least 8£ accurate, "'ancf in sdme instances, more accurate resjjlts 



- thai\ thosftAof conventional linear and* curvilinear .equating titodels (Angoff,' 

'XflXi^ffiX *hfc- College Board Scholafctic Altitude Test <JSAT>. In the second. 

«* : • . * rw - '* ' , v\ • 

study, Petejrseri t et al. v investigated the drfft in *SAT score scale by comparing 

'the .jrestilts obtained from the conventional and IRT equaling methods. Their i 

» * • 4 v * r 

study design involved* the equating of* a 'test to itself in a circular chain - 

•through a« series of links (e.g. , a -^-> b — > c -7-*- d — » e* — > f — > a) in which g 

each* rfew test -is equated to a pre,viou§ one through an anchor test common to" . 

the adjacent pal^ o£ tests being equated." The ^tent of the scale drift was 

then determined as the y difference between the scaled-score conversions fo£ 

each raw score\on test £,at the start and at the end of the circular .chain'. 

They concluded from their results that the surliest scale drift occurred under 

the ,IRT equating method. * * 

i ' * v ' * ' * 

\ ° % IiftRT equating works for the SAT, can it also work for achievement t^sts? 

Achievement tests, in genferal, may not satisfy tlfe assumption of unidimension- 

ality .^ich ; ^underlies the tise of latent trait models. Therefore, the primary 

purpose of this investigation is to explori? the extent to which IRT equating 

results parallel those* of conventional equating methods under conditions which 



probably -violate the unidimensioriality Assumption. 



This investigation was supported by the College Entrance Examination Board 
• through its testing -and research programs. The author wishes to fchank Martha 
'Stocking for assisting with the L.0GIST runs and to Samuel Livingston for h,is 
helpful comments on the draft 0% this report. 



Anothef reason for exploring the feasibility of IRT equating for different 

types of achievement tests is that, under the current test-disclosure environ- 

ment, it may not even b6 possible to locate a single previous* edition with^a 

* / * 

sufficient number of items in comnion with'-a new , edition* to allow for the use * 

- - > : * • \ , " . f - - t 

of conventional equating models. --'"But* « IRT equating requires -only that '^suffi- - 

cient ntgnber of items on a new test edition, will ^ve,tfeen calibjateS and placed 

on a common ability scale. Therefore,, IRT equating \:ould # *gtill b,e* accomplished 

. * *• ■ * 

even if' the calibrated itjatife on the new test edition had been drawn from several 
♦ tt * * 

< * 

previous editions . * . * - v 



Design of the Study * ' 



# 



The multiple-choice sections" of 11 achievement examinations administered 

f i * ' 

in the College Board Advanced Placement program were used for the study. Except 

v 

for two 45-minute examinations in Physics C (Mechanics, and .Electricity and 
Magnetism), the remaining' nine were made up v of 75- to 90-mifiute examinations. 

% * V 

The equated. scores on .two editions, A and B, of each achievement examina- 

- *' • * • 

tion were determined by three equaling methods: the conventional linear and 

equipercentile equating methods described by Ango/f (1971, pp. 568-83) and^ the 

three-parameter* IRT equating .method, for a given test 'A score, the equated 

test B scores obtained under the two conven^iofial equating methods were then 

compared with the* corresponding test B score 'obtained under the IRT equating 

•* * - 
method. "/ ^ ' * 

^ree equating procedures used internal' 'anchor test's ranging from 14 



to 30 questions sT>. For - the IRT equating, ttfe internal anchor ^ te?t was used tcf * K 
transform the item par^m^ters foi? each tolal test t<? a^epmmonfjability^ scale . 



The program LOGJST <Wood, WiAgersky & Lgwrd, 1976; Wood & Lord, 1976) was used 

to obtain th£ item parameter estimates from which the true-score equating of 

raw scores oil tests A and B was accomplished. 

♦Although it would have strengthened the st^idy to confirm by factor . 

analytic methods that the exams used in -the study are not unidimensional, the 

diversity o^ the 'content areas encompassed by some of those exams leaves little 

doubt ^Hout their, beinj^far, from. unidimensional. The 120-item biology exam, ' 

for example, was made up of questions in three specific content areas: organ- 

isjnal, molecular and\opulational biology , *each area testing knowledge of facts, 

principles processes of biology, understanding the means by which biological 

information is collected, hoty it is interpreted, and how one formulates hypotheses 

from available' data and makes further predictions. The chemistry exam contained 

questions on structure of matter, states of matter, chemical reactions, and 
• » • 

descriptive chemistry. The questions dealt wi'th understanding and application . 

of principles or calculations or observations and conclusions in experimental 

* 1 
situations, etd. The physics exam tested knowledge of physics and the ability- 

to interpret and apply the knowledge both qualitatively and quantitatively, 

determine directions of vectors or paths' of particles or light rays, draw or 

interpret diagrams, account for^observed phenomena, interpret; or ^jflfress 

physical relationships in graphical forms, manipulate equations ^nd solve* problems 

The, foreign language exams, comprising listening, reading, writing and speaking . y 

.components, tested\the ability to comprehend formal and" informal spoken language,' 

the acquisition of vocabulary and a^ grasp of structure' as we£l as' the ability to 

express ideas orally with accuracy and-fluency. * % - J * ' * 



5 



The conventional and the IRT equatings used independent representative m 
* * 

samples from the total candidate groups for tests^A and B. Part of the 
reason foi; not usiag the same sample is that equating was done long after the 
operational program administration. ,Also, since the cost of LOGIST is directly 
related to sample size*, «Lt was necessary to reduce the size of some of the IRT 

4 

equating samples. . * 

Table f shows the examinations used for the equatings, the total number 

» * , ♦ , *• 

of Items in the two editions, A and 4 B, of each examination, the number of 
common items, and the number of students in equating samples and the total 
candidate group for each test edition. . 



I 



Equating Results 



Tables 2. a. — 2.e. show the equivalent scores on Form B for each of the 
three equating methods for* gelectecP^Qore points on Fo t rm*A. The linear o 
conversion .parameters for transforming the Form A scores to their equivalent 

Form B scores are indicated at the bottom of the tabulations -for each examina- 

v* 

tion.* These parameters were derived* frojm fc^e -Tucker observed-score linear 
equating model in preference to the calculations^ which had also been obtained 
by applying the Levine equating jnodel (Angoff, 1971). >The decision rjjle as to 
which of the two linear model's equating results should 1>e used for ecore 

reporting dj&pdnds oti the differences in ability .level between the groups that 

* / * * * 

took the test editions being equated as well as on the degree of parallelism + 

'* ^ ^ ' * . ' " « 

between the tests." For noh-^piatalle,l tests administered to^grOups that are not 

widely discrepant in ability (as is usually the dase for the caliber o'f <J£p<tal7 

group Candidates for the Advanced Placement program)- the Tucker linear model 
* * * 

was 'indicated for score reporting.^ * 4 . * 



Tables 2. a. — 2.e, show that the -results of the different equating methods 
• * » 
are In very close agreement, not differing by more than one point, except at 

the *two extremities of each sc&le where sccdre equiyalences are not usually *as~ 

accurate because of the scarcity of dati at those score levels. These 

observations are further *c<tofirm6d by the graphs' of the equated scores in « 

Figures A-K. The close agreement between the results of the three equating 

methods, particularly those of the IRT and equipercentile methods, confirms 

'that the IRT equating method can be used to generate scores that are equivalent 

to those of conventional ^equating methods. 

- v * 

- v • Conc lusion * , 

Although the unidimensionality of the tests used in this equating 

- " \ * 

experiment was not directly tfested, the wide diversity of their content specif i 
cations, the behavioral aspects of the skills and abilities tested as wtell as * 
the multidimensionality of corresponding testis for similar ability groups 
clearly suggest that one could not safely assume ttilt the tests used for this 
study are unidimensional. Despite the apparent' violation of that assumption, 
the equating results obtained through the IRT equating model were found to be 
in agreement with those of the conventional equating madels.. The application 1 
of factor analytic procedures' to demonstrate the multidime'nsionality. of the 
tests used in this investigation would have Itrengthened the stu<fy. It is, 
however, recommended that a replication of the present .study include a design ' 
for establishing the extent of scale drift undfcr each of the three equating 
.models by equating a test to itself through a series of intermediate tests in' 
cyclical chain link. > , 



.Bp demonstrating that the IRT equating results parallel those of the 
simpler, less costly, conventional method^, it has been shown that it is 
still possible to equate scbres on non-parallel, tests under Conditions fthich 
make conventional equating inapplicable. Such a situation will arise when 
anchor items embedded in a new V test cannot be drawn from & single previous 
edition but from several previous editions containing calibrated items. 
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Table 1 



Tests and Equating Samples 



EXAMINATION 

1. AMERICAN HISTORY 
\ A(100)-»B(79) @ 

2. BIOLOGY 

A(i20)-»B(120) 

3. CHEMISTRY 

A(80)~>B<80) 

4*. EUROPEAN HISTORY 
''. A(110).-+B(90) * . 

5-. FRENCH LANGUAGE 
A(100)-r'B(100) 

6. MATH: 4 CALCULUS AB 
- A(45]l-*B(45) • 

.7. MATH: CALCULUS BC 
« . - A(45)-*B(45) 

. 8, PHYSICS B 

t ' A.(68)-*B(70) 

9. ' PHYSICS C (MECH.) 
A(35)^B(35) 

10.. PHYSICS C (E&M) 
A(.35)-»B(35) 

11. SPANISH LANGUAGE 
A(90)-* B.(90) * 



No. of 
♦ Common 
Equating 
Items i 

21 
30 

21 



CONVENTIONAL- EQUATING 

Number of Students 

Old Form (A) New Form (fi) 
Sample Sample 



IRT EQUATING 



Number of Students 



4,901 
A, 843 
6,084, 
.5., 799 



23 '. 


' 1,550* 


45 ' 


~ 3,277 


15 


6,524 


23- 


> 

1,605 


14 


1,462 


U ' 


1,220 


27 


1,056* 




— :x- 



4,8-47 

• %• 

5,422 



3,219 
3,245 
1,692* 
2,949 
2,971 
'1,647 
.1,402 
1,057 . 
1,249* 



New Form '(B) 

Sample (Total Gup) * Sample " "(Total Grp) 

3,il4 (28,079)' 



O ld Form (A) 

1,782 (21,080)' 
1,614'. (10,377) 



3,048„ ( 6,1881 

, 2T899 ( 5,87f) 

1,'533* ( 1.574) 

1,869 v (13,885) 



3,165 
2,694 
3,982 



(12,782) 
( 8,084) 
( 7,965V 



2,775* ( 2,775)' 
3„092 (15,581) 



3,259 


( 6,616) 


3.B50 ( 


1,604 


(1,610) 


2,385 ( 


1,460 


■ ( 1,489).. v 


2-,t)96 ( 


'1,222 


. ( 1, 240) * . 


1,669' • ( 


1,040* 


( 1,066) 


2,805*. ( 

* 


g., 100 in Form A* and 79*" in Form B 



for American History. 

k Ayailable "standard" group, i.e., those cai^ 
© n 1 month in a French- or Spanish-spea*Ing country. 



idates who are non fiative-speakers .and who have spent less 



ERIC 10 



v. 



-9- * * v 

» 

* « * 

) Table 2. a. 

Comparison of Raw to Raw Score Conversions 1 
, Obtained from Conventional and IRJ Equating Methods 



AMERICAN 'HISTORY 



EQUIVALENT FORM B S.CORE 
(MAX. POSS. 790' 



EUROPEAN HISTORY , 
' S * 

EQUIVALENT FORM.B SCORE 
(MAX. POSS. 90) 



' FORM A 
v-Luu max* ) 


TPT 

JLiv. J. 


EQUIPER- 

• PTTTJTTT T7 

• t 


LINEAR* 


FORM 'A 
viiu max. ) 


TDT 


EQUIPER- 

PTTXTTTT T7 


LINEAR* 


100 


•79 


79' 


73 1 


• 110- • , 


9t> 


J- ^ 


90 




'69 


69 


66 • 


4 100 . 


. 8 S 2 


83 




• 80 


6.0 


60 


58 ' • 


■ ; • • e - 90 ' . 

80 ' 


73 


74 . 


74* 


74 


55 


54 




65 


66- 


66 


* 70 


51 


51 


51 


* ' '72 


5$ . 


60 


' 60 * 


> - 60 . 


*44 


43 

e • 


44 


60 


• >5 °: 


51 . 


50 * 


-59 


43 




- 43 . 


•50 

..««•• 


42 


42 


. 41 \ 


50 


0 

36 


36 


- ^ 6 


» 40 • 


35 


. * 34 


•"'•33 


A3 


33 


32 


' ' 33>, 


:- •'•s.. 


31 


3d 


' 30 . 


.. '40 


29. 


. 29 


29 


' 30 * 


-26 


25 ' 


'' 7 25 


3<>\ 
22\ 


- , 23 
.17 


.22 <» 


22 
'., 16 


- ' * ' 24. 
*» 20 


21 
17 


20 J 
16 

i 


20 

17 - 


16 


13' 


• i 13' 


11 


» , 10 


8 


7 


, 8 


* 15 


12 

' 9 


12 ' 
9 


11 . 
7 ' 


5 " 

■ > -9 


* 4 
0 


1 

0(-l) 


4 
0 


* 6 


6 
2 


' 6 

3. ' 


4 ' 
0 


*.* * < 


i. c- 


• .» ^ 


t 



FORM B - .7327(A) - 0.3762 



EO&i B> 0. 8250 (A)Nt^ 0 4954 



. 1 



9 

ERIC 





• • 


* 




• • 

* • 




t ' 

A 

% 




< < 




• 


-io- 










1 


Table 2.b.< 




> * 






Comparison of Raw to Raw Score Conversions 
Qbbained from Conventional and IRT Equating Methods , 






BIOLOtff 




f * » 


CHEMISTRY '* 






EQUIVALENT FORM B SCORE 
(MAX. 1>QSS. 120) 




% 

t 

EQUIVALENT FORM B SCORE 

(MAX. PQSS. 80) 
K \ 




FORM A. 
(120 max.) 


EQUIPER- 
IRT CENTILE 


LINEAR* 
(.TUCKER) 


FORM A ' 
(80 max.)- 


IRT 


"CAT TT "DTD T TXTP AO* 

CENTILE (TUCKER^ 




120 


120 120 


118 


80 

A 

75 


80 


75 73 




iio 


110 110 


108 


73 


71 68 ' 




100 „ 


99 -98; 




70 


0 / 






90 • ' 


• 87 . 88 ' 


' 88 


60 


55 


54 ■ 54 




J - 35 ' <~. 

'^V "J > 


' 82 " ""¥3 


■ 83' 


59 


54 


53 " ' 53 > 




70 ■ 


. 67 . " 68 




. ' 50 . : 


45 


45 45 




60 


57/ » ' 58 


'■ 58 


48 - 


43 


/ 43 43 




• • ." 47 ' 


; 44 .4*4 


44 " 


- .40-;.. . 


35 


35 . 35 




' 40 


38 38 


"'37 u . -' 




28 


28. * . 28 




31 


. 29! 28 


, 28 


25 

* •* 


21 


2^U . 21 




20 


...19 ' 17. 


. 17 


* • 

: -22 




19- ' 18 


J* 


..«.« *** ' 10 


10 . 8 ' 


7 


20 




17 .' ' 16 

}- < * , * 




0 


i 0 


OC-3) 


'10' 


-8 


; . "8; - i 7 • 










" . ' 5 


4 


"4 2 






• * 


• 


. • : ■ o 

* 
* 


• "0 

V 


/' ^ 0 4 0(-2r) 


I 


*FOBM B » 


1.0143(A) -* 3.2529 


• 

' *F0RM B" a .9375(A) -2.3118 ' . x 


; t 


• 






• 13 

» 








% > 

*• ' " 






1 






><* 








• 






r, * 


ERjcy ; 


'«.*-• / * 





















•Table 2,c. 



Comparison of Raw to. Raw Score Conversions 
Obtained from Conventional and IRT Equating Methods 



CALCULUS AB. 



EQUIVALENT FORM B -SCORE 
(MAX, POSS. 45) 



FORM B =» 0.9819(A) - 1.9861 

CALCULUS BC 



45 




* 45 


45 


• 40 




40 


' 40 


36 




35 


35 


30 




29 


29 


25 




. 23 


23 


20 


* 


18 < 


18 


18 


16. 


16 


13 




11 


10 


10 




8 


7 


5 




3 - 


3 


- .- 0 




0.(-l) 


0 



FORM A 
(45 max.) 


IRT 


EQUIPER- 
a CENTILE 


LINEAR* 
(TUCKER) 


4-5 


45 


44 ' , 


42 • 


•40 


39 


38 


' 37 


36 


$34 


•37 


33 


29 


26 


26 


26 


25 • 


8* • 22 


'22 


23^ 


20 


17 


17 


18 


15 • 


. 12 


12 


. 13 


12 


10 


io- 


10 


10 

L 


8 


. 9 


8 " 


' 5 


4 


5 


. 3 


0 


.. 0 


0 


0(-2) 



44 
39 
35 
28 
23 
18 
16 
10 

7- 

2 

0(-3) 



*F,tfRM B - 1.0556(A) - 3.2515 



mc 



' . Table 2.d. , 

• i , < ' - 

Comparison €>f Raw to Raw Score .Conversions , 
' ' • Obtained from Conventional aud c IRT Equating Methods 

« • a 

. PHYSICS' B 1 

* ' '•*'• ' * : . • 

EQUIVALENT FORM B SCORE EQUIVALENT FORM B SCORE 

> - (MAX. POSS. 70) n • (MAX. POSS. 35) - 

FOR*^' EQUIPER- LINEAR* FORM A * EQUIPER- LINEAR * 

(68 Max.) IRT CENTILE (TUCKER ) (35 max.) IRT ■ CENTILE ( TUCKER ) 



'68 


70 


66 




70 




35 


35 . 35 




60 


62 . 


66 




62 




30 


29 , 29 


29 


50 


52 1 


59 




52 




25 ' , 


24 24 


24 


43 


45 


46 




44 


* 


,21 


20 20 




40 


42 


42 




41 




14 


13 14 


13 


33 


35 


35 




34 




10 


9*9* 




20 


. 21 


21 




21 




6 


6 ' 6; 


6 


16. 


17 


17 




17 




5 


e S 


5 


10 


11 


11 




>. 11 




0 


'0 °o 


0 


5 




6 




. 6 










0 


1. 


0 




1 










FORM B 




4. 


.5478 






*F0RM B = 


0.9857(A) - 0.3743 








i 


PHYSICS (ELEC. , 


& MAGNETISM) * 












•35 




'35"' 


34 


35 ■ 










30 




30 


30 * 


30" 










25 




25 


25' 


25 










16 




16 


: . 16 


16 










11 




11' 


11 


11 










' 7 




7 


' 6 


7 . • 










4 






4 


4 










1 




1 


• 1 

* 


' 0 (.47) '•' . "« 

> 










*F0RM B » 


1.0163(A) 


- 0.5462 












:\ 






15 
















> 









A* 



^ ■ 



Table 2 ? e. 

/' 

Comparison of Raw to Raw .Score Conversions 
'Obtained from Conventional and IRT Equating Methods 



FRENCH LANGUAGE 



rentiona-l an 

-7' 



FORM A 
(100 Max.) 



EQUIVALENT FORM B'SCC 
(MAX. POSS, 100) / 

EQUIPER- ;i£NEAR* 
IRT CENTILE (TUCKER ) 




SPANISH LANGUAGE 



EQUIVALENT FORM B" "SCORE 
(MAX. POSS. 90) 



FORM A • 
(90 max.) 



100, - 


100 


a 


/ 100(108) 


90 


90, 


92 1 


93 


93 


85 


80 


84 


84 , 


• 82 


80 


73 


77 


. 74 


75 


70 


65 ' 


* 68 . 


*66' 


66 • 


/ 60 


V 

57 


_ 58 


.58 


58 


I * 


50 


• 50 ^ 


50 


50 


50 


39 


38 


-37 


38 


45. 


35 


• ' 33 . 


-34 


34 


40 


30 
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